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Abstract 

We study asymptotically optimal statistical inference concerning 
the unknown state of N identical quantum systems, using two com- 
plementary approaches: a "poor man's approach" based on the van 
Trees inequality, and a rather more sophisticated approach using the 
recently developed quantum form of LeCam's theory of Local Asymp- 
i-S^ ' totic Normality. 

1 Introduction 

The aim of this paper is to show the rich possibilities for asymptotically op- 
timal statistical inference for "quantum i.i.d. models". Despite the possibly 
exotic context, mathematical statistics has much to offer, and much that 
we have leant - in particular through Jon Wellner's work in semiparametric 
models and nonparametric maximum hkelihood estimation - can be put to 
extremely good use. Exotic? In today's quantum information engineering, 
measurement and estimation schemes are put to work to recover the state of 
a small number of quantum states, engineered by the physicist in his or her 
laboratory. New technologies are winking at us on the horizon. So far, the 
physicists are largely re-inventing statistical wheels themselves. We think it 
is a pity statisticians are not more involved. If Jon is looking for some new 
challenges... ? 

In this paper we do theory. We suppose that one has N copies of a 
quantum system each in the same state depending on an unknown vector 
of parameters 6, and one wishes to estimate 6, or more generally a vector 
function of the parameters V'(^); by making some measurement on the N 
systems together. This yields data whose distribution depends on 6 and on 
the choice of the measurement. Given the measurement, we therefore have a 
classical parametric statistical model, though not necessarily an i.i.d. model. 
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since we are allowed to bring the N systems together before measuring the 
resulting joint system as one quantum object. In that case the resulting 
data need not consist of (a function of) N i.i.d. observations, and a key 
quantum feature is that we can generally extract more information about 
9 using such "collective" or "joint" measurements than when we measure 
the systems separately. What is the best we can do as — t- oo, when we 
are allowed to optimize both over the measurement and over the ensuing 
data-processing? 

A statistically motivated, approach to deriving methods with good prop- 
erties for large N is to choose the measurement to optimize the Fisher in- 
formation in the data, leaving it to the statistician to process the data 
efficiently, using for instance maximum likelihood or related methods, in- 
cluding Bayesian. This heuristic principle has already been shown to work 
in a number of special cases in quantum statistics. Since the measurement 
maximizing the Fisher information typically depends on the unknown pa- 
rameter value this often has to be implemented in a two-step approach, first 
using a small fraction of the N systems to get a first approximation to the 
true parameter, and then optimizing on the remaining systems using this 
rough guess. 

The approach favoured by many physicists, on the other hand, is to 
choose a prior distribution and loss function on grounds of symmetry and 
physical interpretation, and then to exactly optimize the Baycs risk over 
all measurements and estimators, for any given A'^. This approach succeeds 
in producing attractive methods on those rare occasions when a felicitous 
combination of all the mathematical ingredients leads to an analytically 
tractable solution. 

Now it has been observed in a number of problems that the two ap- 
proaches result in asymptotically equivalent estimators, though the mea- 
surement schemes can be strikingly different. Heuristically, this can be 
understood to follow from the fact that, in the physicists' approach, for 
large N the prior distribution should become increasingly irrelevant and the 
Bayes optimal estimator close to the maximum likelihood estimator. More- 
over, we expect those estimators to be asymptotically normal with variances 
corresponding to inverse Fisher information. 

Here we link the two approaches by deriving an asymptotic lower bound 
on the Bayes risk of the physicists' approach, in terms of the optimal Fisher 
information of the statisticians' approach. Sometimes one can find in this 
way asymptotically optimal solutions which are much easier to implement 
than the exactly optimal solution of the physicists' approach. On the other 
hand, it also suggests that the physicists' approach, when successful, leads 
to procedures which are asymptotically optimal for other prior distributions, 
and other loss functions, than those used in the computation. It also suggests 
that these solutions are asymptotically optimal in a pointwise rather than a 
Bayesian sense. 
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In the first part of our paper, we d erive our new bound by combining an 
existing quantum Cramer-Rao bound (jHolevd . Il982l ) witli tlie va n Trees in- 
equa l ity, a Bayesian Crame r-Rao bound from classical statistics ( van Treei . 



loii : ICill and LevTtl . Il995l l. The former can be interpreted bound on 



the Fisher information in an arbitrary measurement on a quantum system, 
the latter is a bound on the Bayes risk (for a quadratic loss function) in 
terms of the Fisher information in the data. This part of the paper can be 
understood without any familiarity with quantum statistics. Applications 
are given in an appendix to an eprint version of the paper at arXiv.org. 

The paper contains only a brief summary of "what is a quantum sta- 
tis tical model" ; for more i nforrn ation the reader is referred to the papers 
of lBarndorff-Nielsen etaP ^2QqA ). andloii (j2nnih . For an ov e rview of the 
"state of the art" in quantum asymptotic statistics see lHavashil (j2005l ) which 
reprints papers of many authors together with introductions by the editor. 

After this "simplistic" part of the paper we present some of the recently 
developed theory of quantum Local Asymptotic Normality (also mentioning 
a number of open problems). This provides an alternative but more sophis- 
ticated route to getting asymptotic optimality results, but at the end of the 
day it also explains "why" our simplistic approach does indeed work. In 
classical statistics, we have learnt to understand asymptotic optimality of 
maximum likelihood estimation through the idea that an i.i.d. parametric 
model can be closely approximated, locally, by a Gaussian shift model with 
the same information matrix. To say the same thing in a deeper way, the 
two models have the same geometric structure of the score functions of one- 
dimensional sub-models; and in the i.i.d. case, after local rescaling, those 
score functions are asymptotically Gaussian. 

Let us first develop enough notation to state the main result of the paper 
and compare it with the comparable result from classical statistics. Starting 
on familiar ground with the latter, suppose we want to estimate a function 
tp{9) of a parameter 9, both represented by real column vectors of possibly 
different dimension, based on i.i.d. observations from a distribution with 
Fisher information matrix I{9)- Let vr be a prior density on the param- 
eter space and let G{9) be a symmetric positive-definite matrix defining a 
quadratic loss function Z(V^(^),e) = (^(^) - V(^))^G(0)(^(^) - V(^)). (Later 
we will use G{9), without the tilde, in the special case when ■0 is ^ itself). De- 
fine the mean square error matrix V^^\9) = Eo^'tp^^^ — ip{9)){'il)'^^'^ —^{9))~^ 
so that the risk can be written R^^\9) = trace G{9)V'^'^\9). The Bayes risk 
is R^^\7t) = ETrtrace . Here, Eg denotes expectation over the data 

for given 9, denotes averaging over 9 with respect to the prior vr. The 
estimator i/j^^^ is completely arbitrary. We assume the prior density to be 
smooth, compactly supported and zero on the smooth boundary of its sup- 
port. Furthermore a certain quantity roughly interpre ted as "information i n 
the prior" must be finite. Then it is very easy to show (|Gill and Levitl . ll995l ). 
using the van Trees inequality, that under minimal smoothness conditions 
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on the statistical model, 



liminfiVi2(^)(7r) > E^traceG/"^ (1) 

where G = ip'Oip''^ and -0' is the matrix of partial derivatives of elements of 
■0 with respect to those of 9. 

Now in quantum statistics the data depends on the choice of measure- 
ment and the measurement should be tuned to the loss function. Given a 
measurement on N copies of the quantum system, denote by 

the average Fis her info r matio n (i.e., Fisher information divided by N) in 



th e data. The |Holevd (119821) qu antum Cramer-Rao bound, as extended 
by Havashi and Matsumotol ( 20041 ) to the quantum i.i.d. model, can be ex- 



pressed as saying that, for all 6, G, N and M^-^\ 

trace G(0) (/if > QciO) (2) 

for a certain quantity Cg(^), which depends on the specification of the quan- 
tum statistical model (state of one copy, derivatives of the state with respect 
to parameters, and loss function G) at the point 6 only, i.e., on local or point- 
wise model features (see ([7|) below). 

We aim to prove that under minimal smoothness conditions on the quan- 
tum statistical model, and conditions on the prior similar to those needed 
in the classical case, but under essentially no conditions on the estimator- 
and-measurement sequence, 

liminf iVii(^)(7r) > E^Cg (3) 

Af— s>oo 

where, as before, G = t/^'Gip''^ . The main result is exactly the bound 
one would hope for, from heuristic statistical principles. In specific models 
of interest, the right hand side is often easy to calculate. Various spe- 
cific measurement-and-estimator sequences, motivated by a variety of ap- 
proaches, can also be shown in interesting examples to achieve the bound, 
see the appendix to the eprint version of thi s paper. 



It was also shown in I Gill and LevitI (|l995l ). how — in the classical statistr 



cal context — one can replace a fixed prior tt by a sequence of priors indexed 
by N, concentrating more and more on a fixed parameter value Oq, at rate 
l/\fN. Following their approach would, in the quantum context, lead to 
the pointwise asymptotic lower bounds 

liminf A^ii(^)(^) > Qg{0) (4) 

N^oo 

for each 9, for regular estimators, and to local asymptotic minimax bounds 
lim liminf sup NR^^\9) > QaiOo) (5) 

M^-oo N^oo ||0_e„||<Ar-i/2M 
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for all estimators, but we do not further develop that theory here. In classi- 
cal statistics the theory of Local Asymptotic Normality is the way to unify, 
generalise, and understand this kind of result. In the last section of this pa- 
per we introduce the now emerging quantum generalization of this theory. 

The basic tools used in the first part of this paper have now all been 
mentioned, but as we shall see, the proof is not a routine application of 
the van Trees inequality. The missing ingredient will be provided by the 
following new dual bound to for ah 6, K, N and M^, 

trace K{9)fM\^) < e^(^) (6) 

where C^{6) actually equals Qg{G) for a certain G defined in terms of K (as 
explained in Theorem [2] below). This is an upper bound on Fisher informa- 
tion, in contrast to ([2]) which is a lower bound on inverse Fisher information. 
The new inequality ^ follows from the convexity of the sets of information 
matrices and of inverse information matrices for arbitrary measurements on 
a quantum system, and these convexity properties have a simple statistical 
explanation. Such dual bounds have cropped up incidentally in quantum 
statistics, for instance in iGih and Massa] (|200d ). but this is the first time a 



connection is established. 

The argument for and given that, for ([3]), is based on some general 
structural features of quantum statistics, and hence it is not necessary to be 
familiar with the technical details of the set-up. 

In the next section we will summarize the i.i.d. model in quantum statis- 
tics, focussing on the key facts which will be used in the proof of the dual 
Holevo bound (0) and of our main result, the asymptotic lower bound 

These proofs are given in a subsequent section, where no further "quan- 
tum" arguments will be used. 

In the final section we will show how the bounds correspond to recent 
results in the theory of Q-LAN, according to which the i.i.d. model converges 
to a quantum Gaussian shift experiment, with the same Holevo bounds, 
which are actually attainable in the Gaussian case. An eprint version of 
this paper. Gill and Gu^a (2012) includes an appendix with some worked 
examples. 



2 Quantum statistics: the i.i.d. parametric case. 

The basic objects in quantum statistics are states and measurements, de- 
fined in terms of certain operators on a complex Hilbert space. To avoid 
technical complications we restrict attention to the finite-dimensional case, 
already rich in structure and applications, when operators are represented 
by ordinary (complex) matrices. 
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States and measurement The state of a d-dimensional system is repre- 
sented by a d X d matrix p, called the density matrix of the state, having 
the following properties: p* = p (self-adjoint or Hermitian), p > (non- 
negative), trace (p) = 1 (normalized). "Non-negative" actually implies "self- 
adjoint" but it does no harm to emphasize both properties. denotes the 
zero matrix; 1 will denote the identity matrix. 

Example: when d = 2, every density matrix can be written in the form 
p = 5(1 + Oiai + 6*2(72 -I- O^as) where 




are the three Pauli matrices and where Of + 62 + 9^ < ^- D 

"Quantum statistics" concerns the situation when the state of the system 
p{9) depends on a (column) vector 9 oi p unknown (real) parameters. 

Exam,ple: a completely unknown two-dimensional quantum state depends 
on a vector of three real parameters, 9 = {9i,92,9^)~^ , known to lie in the 
unit ball. Various interesting submodels can be described geometrically: 
e.g., the equatorial plane; the surface of the ball; a straight line through the 
origin. More generally, a completely unknown d-dimensional state depends 
on p = — 1 real parameters. □ 

Example: in the previous example the two-parameter case obtained by de- 
manding that 9\ + 92 + 9"^ = 1 is called the case of a two-dimensional 
pure state. In general, a state is called pure \i p^ = p or equivalently p 
has rank one. A completely unknown pure d-dimensional state depends on 
p = 2{d — 1) real parameters. □ 

A measurement on a quantum system is characterized by the outcome 
space, which is just a measurable space (X, S), and a positive operator val- 
ued measure (POVM) M on this space. This means that for each f? G S 
there corresponds a. d x d non-negative self-adjoint matrix M{B), together 
having the usual properties of an ordinary (real) measure (sigma-additive) , 
with moreover M(X) = 1. The probability distribution of the outcome of 
doing measurement M on state p(9) is given by the Born law, or trace rule: 
Pr(outcome G -B) = trace(p(6')M(i?)). It can be seen that this is indeed a 
bona-fide probability distribution on the sample space (X, S). Moreover it 
has a density with respect to the finite real measure trace(M(S)). 

Example: the most simple measurement is defined by choosing an orthonor- 
mal basis of C^, say V'lr • • i4^d, taking the outcome space to be the discrete 
space X = {1, . . . , d}, and defining M{{x}) = ipx^Px x G X; or in physi- 
cists' notation, M({x}) = \'<px){'4>x\- One computes that Pr(outcome = x) = 
il)*p{9)il)x = {''px\p\''Px) ■ If the state is pure then p = (fxp* = |^)(^| for some 
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(f> = (t){9) G of length 1 and depending on the parameter 9. One finds 
that Pr(outcome = x) = IV'^'AP = I (V^x | P • □ 

So far we have discussed state and measurement for a single quantum 
system. This encompasses also the case of copies of the system, via a 
tensor product construction, which we will now summarize. The joint state 
of N identical copies of a single system having state p{9) is p{9)®^ , a density 
matrix on a space of dimension . A joint or collective measurement on 
these systems is specified by a POVM on this large tensor product Hilbert 
space. An important point is that joint measurements give many more 
possibilities than measuring the separate systems independently, or even 
measuring the separate systems adaptively. 

Fact to remember 1. State plus measurement determines probability dis- 
tribution of data. 



Quantum C ramer- Rao bound. Our main input is going to be the 



Holevd (119821) quantum Cramer-Rag bound , with its extension to the i.i.d 



case due to Havashi and Matsumotol ( 2004 ) 



Precisely because of quantum phenomena, different measurements, in- 
compatible with one another, are appropriate when we are interested in 
different components of our parameter, or more generally, in different loss 
functions. The bound concerns estimation of 9 itself rather than a function 
thereof, and depends on a quadratic loss function defined by a symmetric 
real non-negative matrix G{9) which may depend on the actual parameter 
value 9. For a given estimator 9^^^^ computed from the outcome of some 
measurement on A^ copies of our system, define its mean square error 

matrix V^'^\9) = Eg(9^^^ - 6)(e^'^^ - 9^ . The risk function when using the 
quadratic loss determined by G is R^^\e) = E0(^(^) - 0)^^(0) (^(^) -9) = 
tvace{G{9)V(^\9)). 

One may expect the risk of good measurements-and-estimators to de- 
crease like A^~^ as A^ — 7- oo. The quantum Cramer- Rao bound confirms that 
this is the best rate to hope for: it states that for unbiased estimators of 
a p-dimensional parameter 9, based on arbitrary joint measurements on A^ 
copies, 

NR^^\9) > Qg{9) = inf trace(G(0)y) (7) 

Xy:V>Z{X) 

where X = {Xi, . . . , Xp), the Xi are d x d self-adjoint matrices satisfying 

d/d9i tTace{p{9)Xj) = 6ij, (8) 

Z is the p X p self-adjoint matrix with elements trace{p{9)XiXj), and V is 
a real symmetric matrix. It is possible to solve the optimization over V for 
given X leading to the formula 

CgW = mftTace{^{G^/^Z{X)G^/^) + ahs'^{G^/'^Z{X)G^/^)) (9) 

X 
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where G = G{9). The absolute value of a matrix is found by diagonalising it 
and taking absolute values of the eigenvalues. We'll assume that the bound 
is finite, i.e., there exists X satisfying the constraints. A sufficient condition 
for this is that the Helstrom quantum information matrix H introduced in 
()27p below is nonsingular. 

For specific interesting models, it often turns out not difficult to compute 
the bound Sg(^). Note, it is a bound which depends only on the density 
matrix of one system [N = 1) and its derivative with the respect to the 
parameter, and on the loss function, both at the given point 9. It can be 
found by solving a finite-dimensional optimization problem. 

We will not be concerned with the specific form of the bound. What we 
are going to need, are just two key properties. 

Firstly: the bound is local, and applies to the larger class of locally un- 
biased estimators. This means to say that at the given point 9, E,?(^) = 9, 
and at this point also d/d9i E^^j'^^ = 6ij. Now, it is well known that the 
"estimator" + -^(^o)~^'S'(6'o)) where I{9) is Fisher information and S{9) is 
score function, is locally unbiased at 6* = o.iT'd achieves the Cramer-Rao 
bound there. Thus the Cramer-Rao bound for locally unbiased estimators 
is sharp. Consequently, we can rewrite the bound ([7]) in the form ([2]) an- 
nounced above, where I^^\9) is the average (divided by N) Fisher informa- 
tion in the outcome of an arbitrary measurement M = M^^'^ on A*" copies 
and the right hand side is defined in d?]) or Q. 

Fact to remember 2. We have a family of computable lower bounds on 
the inverse average Fisher information matrix for an arbitrary measurement 
on N copies, given by ^ and ^ or 

Secondly, for given 9, define the following two sets of positive-definite 
symmetric real matrices, in one-to-one correspondence with one another 
through the mapping "matrix inverse". The matrices G occurring in the 
definition are also taken to be positive-definite symmetric real. 



Elsewhere (Gill, 2005) we have given a proof by matrix algebra that that 
the set J is convex (for V, convexity is obvious), and that the inequalities 
defining V define supporting hyperplanes to that convex set, i.e., all the 
inequalities are achievable in V, or equivalently Qq = infygv trace(Gy). 
But now, with the tools of Q-LAN behind us (well - ahead of us - see the 
last section of this paper), we can give a short, statistical, explanation which 
is simultaneously a short, complete, proof. 

The quantum statistical problem of collective measurements on A^ iden- 
tical quantum systems, when rescaled at the proper -J N-vate, approaches a 



V = {V : trace(Gy) > V G} 



(10) 



3 = {I : trace(G/-^) > V G}. 



(11) 
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quantum Gaussian problem as — t- oo, as we will see the last section of this 
paper. In this problem, V consists precisely of all the covariance matrices of 
locally unbiased estimators achievable (by suitable choice of measurement) 
in the limiting p-parameter quantum Gaussian statistical model. The in- 
equalities defining V are exactly the Holevo bounds for that model, and each 
of those bounds, as we show in Section 4, is attainable. Thus, for each G, 
there exists a V £V achieving equality in trace{GV) > Cq- It follows from 
this that 3 consists of all non-singular information matrices (augmented with 
all non-singular matrices smaller than an information matrix) achievable by 
choice of measurement on the same quantum Gaussian model. Consider the 
set of information matrices attainable by some measurement, together with 
all smaller matrices; and consider the set of variance matrices of locally un- 
biased estimators based on arbitrary measurements, together with all larger 
matrices. Adding zero mean noise to a locally unbiased estimator preserves 
its local unbiasedness, so adding larger matrices to the latter set does not 
change it, by the mathematical definition of measurement, which includes 
addition of outcomes of arbitrary auxiliary randomization. The set of infor- 
mation matrices is convex: choosing measurement 1 with probability p and 
measurement 2 with probability q while remembering your choice, gives a 
measurement whose Fisher information is the convex combination of the in- 
formations of measurements 1 and 2. Augmenting the set with all matrices 
smaller than something in the set, preserves convexity. The set of vari- 
ances of locally unbiased estimators is convex, by a similar randomization 
argument. Putting this together, we obtain 

Fact to remember 3. For given 9, both V and 3 defined in l[10\) and ill]) 
are convex, and all the inequalities defining these sets are achieved by points 
in the sets. 

3 An asymptotic Bayesian information bound 

We will now introduce the van Trees inequality, a Bayesian Cramer-Rao 
bound, and combine it with the Holevo bound ^ via derivation of a dual 
bound following from the convexity of the sets ([7]) and Q. We return 
to the problem of estimating the (real, column) vector function ip{9) of 
the (real, column) vector parameter of a state p{9) based on collective 
measurements of identical copies. The dimensions of ip and of 9 need not 
be the same. The sample size is largely suppressed from the notation. 
Let V be the mean square error matrix of an arbitrary estimator ip, thus 
V{9) = Ee(^- ip{9)){i{j - Tp{9)y. Often, but not necessarily, we'll have 
tp = ijj{9) for some estimator of 9. Suppose we have a quadratic loss function 
(ijj — ip{9))~^ G{9){tp — ip{9)) where G is a positive-definite matrix function 
of 9, then the Bayes risk with respect to a given prior vr can be written 
R{7t) = E^rtrace GV. We are going to prove the following theorem: 
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Theorem 1. Suppose p{9) : 9 & Q Q W is a smooth quantum statistical 
model and suppose it is a smooth prior density on a compact subset Qq Q 
such that Go has a piecewise smooth boundary, on which vr is zero. Suppose 
moreover the quantity 3(7r) defined in [16\) below, is finite. Then 

liminfiVfl(^)(7r) > E^Cgq (12) 

where Gq = Tp'Gtp''^ (and assumed to be positive-definite), ip' is the matrix 
of partial derivatives of elements of ip with respect to those of 6, and Qgq 
defined by ^ or 

"Once continuously differentiable" is enough smoothness. Smoothness of 
the quantum statistical model implies smoothness of the classical statistical 
model following from applying an arbitrary measurement to copies of the 
quantum state. Slightly weaker but more elaborate smoothness condition s 



on the statistical model and prior are spelled out in iGill and Leviti (jl995l ) . 
The restriction that Go be non-singular can probably be avoided by a more 
detailed analysis. 

Let Im denote the average Fisher information matrix for 9 based on a 
given collective measurement on the N copies. Then the van Trees inequality 
states that for all matrix functions C of 9, of size dYm{ip) x dim(0), 

where the primes in -0' and in (Cvr)' both denote differentiation, but in the 
first case converting the vector -0 into the matrix of partial derivatives of 
elements of ip with respect to elements of 6, of size dim('0) x dim(0), in the 
second case converting the matrix Cvr into the column vector, of the same 
length as with row elements ^jid / d6j){C7r)ij . To get an optimal bound 
we need to choose C{9) cleverly. 

First though, note that the Fisher information appears in the denomi- 
nator of the van Trees bound. This is a nuisance since we have a Holevo's 
lower bound ([2]) to the inverse Fisher information. We would like to have 
an upper bound on the information itself, say of the form ([6]), together with 
a recipe for computing C^ . 

All this can be obtained from the convexity of the sets 3 and V defined in 
(jlip and (jlOp and the non-redundancy of the inequalities appearing in their 
definitions. Suppose Vq is a boundary point of V. Define /q = Vj^"^. Thus 

Jo (though not necessarily an attainable average information matrix I^m^) 
satisfies the Holevo bound for each positive-definite G, and attains equality 
in one of them, say with G = Gq. In the language of convex sets, and "in the 
^-picture", trace Go 1^ = Cgq is a supporting hyperplane to V at V = Vq. 

Under the mapping "matrix-inverse" the hyperplane trace Gq^ = Cgo 
in the V-picture maps to the smooth surface trace Go/~^ = Cgq touching 
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the set J at Iq in the /-picture. Since J is convex, the tangent plane to 
the smooth surface at / = Iq must be a supporting hyperplane to J at this 
point. The matrix derivative of the operation of matrix inversion can be 
written dA~^/dx = —A~^{dA/dx)A~^. This tehs us that the equation of 
the tangent plane is tiaceGolQ^ IIq^ = trace GqIq^ = Cgo- Since this is 
simultaneously a supporting hyperplane to J we deduce that for all I £ J, 
trace Golo"^^^"^ < ^Go- Defining Kq = Iq'^GqIq'^ and = we 
rewrite this inequality as trace /Cq-^ ^ C^°. 

A similar story can be told when we start in the /-picture with a support- 
ing hyperplane (at / = Iq) to J of the form trace KqI = for some sym- 
metric positive-definite Kq. It maps to the smooth surface tiace KqV~^ = 
e^o, with tangent plane trace /CqV^o^^/V;)"^ = 6^° at V = Vq = I^^. By 
strict convexity of the function "matrix inverse" , the tangent plane touches 
the smooth surface only at the point Vq. Moreover, the smooth surface lies 
above the tangent plane, but below V. This makes Vq the unique minimizer 

of trace/CoV^o~^^^o~^ in 

It would be useful to extend these computations to allow singular /, G 

and K. Anyway, we summarize what we have so far in a theorem. 

Theorem 2. Dual to the Holevo family of lower bounds on average inverse 
information, trace G/^ > Qg for each positive- definite G, we have a family 
of upper bounds on information, 

trace /^7M<e^ for each K. (14) 

///o G J satisfies trace Gq/q"^ = Qgq then with Kq = Iq^GqIq^, C-^'o = Cgq- 
Conversely if Iq € 3 satisfies trace /Cq/o = C^" then with Gq = IqKqIq, 
Ggq = . Moreover, none of the bounds is redundant, in the sense 
that for all positive-definite G and K, Qg = infygv ti'ace(Gy) and = 
sup/gj trace(/C/). The minimizer in the first equation is unique. 

Now we are ready to apply the van Trees inequality. First we make 
a guess for what the left hand side of (fT3|) should look like, at its best. 
Suppose we use an estimator ■0 = ipiO) where 9 makes optimal use of the 
information in the measurement M. Denote now by Im the asymptotic 
normalized Fisher information of a sequence of measurements. Then we 
expect that the asymptotic normalized covariance matrix y of "0 is equal 
to I'^^il)''^ and therefore the asymptotic normalized Bayes risk should 
be Ejrtr ace GV^'/^Z = Ejrtrace G?/''/^/. This is bounded below by 
the integrated Holevo bound E^Sg^ with Go = tp'^^Gip'. Let Iq £ 3 sat- 
isfy trace Gq/J"^ = Sgo! its existence and uniqueness are given by Theo- 
rem [2j (Heuristically we expect that Iq is asymptotically attainable) . By 
the same Theorem, with Kq = Iq^GqIq^, C^^° = Qgo = trace Gq/^^^ = 
trace V''^GV''/o"^ 
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Though these calculations are informal, they lead us to try the ma- 



With this choice, in the 



,/T 



trix function C = GiP'Iq . Define Vq = /q 

numerator of the van Trees inequality, we find the square of trace Cip' 
trace Gtp'lQ^ip'~^ = trace GqVo = Cgo- the main term of the denominator, 
we find trace G~^GiP'Iq^ I mIo^iP'~^G = trace Iq^ GqIq^ I m = trace Ko-^Af < 
qKo — j^y ^j^g dual Holevo bound (jl4p . This makes the numerator of 
the van Trees bound equal to the square of this part of the denominator, 
and using the inequality a^/(a + 6) > a — 6 we find 



iVE^traceCy > E^Cgo - ^d{Tr) 



where 



3(vr) 



(C7r)'TG-i(C7r)' 



(15) 



(16) 



with G = GiP'Vq and Vq uniquely achieving in V the bound trace GqV > Cgq, 
where Go = ip''^Gip'. Finally, provided 3('/r) is finite (which depends on the 
prior distribution and on properties of the model) , we obtain the asymptotic 
lower bound 

liminf A^E^traceGF > E^Cgo- (17) 



4 Q-LAN for i.i.d. models 

In this section we sketch some elements of a theory of comparison and con- 
vergence of quantum statistical models, which is currently being developed 
in analogy to the LeCam theory of classical statistical models. We illus- 
trate the theory with the example of local asymptotic normality for (finite 
dimesional) i.i.d. quantum states, which provides a route to proving that 
the Holevo b ound is asymptotically achievable. For rnore details we refer 
to the papers Guta and Kahn ( 20061 ): Gu^^a et al. ( 20081 1: Guta and Jencova 



(|2007l l: lKahn and Gutal (|2009l ^. for the i.i.d. case and to lGutal (|201lh for the 
case of mixing quantum Markov chains. 

The Q-LAN theory surveyed here concerns strong local asymptotic nor- 
mality. Just as in the classical case, the "strong" version of the theory 
enables us not only to derive asymptotic bounds, but also to actually con- 
struct asymptotically optimal statistical procedures, by explicitly lifting the 
optimal solution of the asymptotic problem back to the finite N situation, 
where it is approximately optimal. It will be useful to build up theory and 
applications of the corresp onding weak local a s ympt otic normality concept. 
A start has been made bv iGu^a and Jengoval (j2007l ). Such a theory would 
be easier to apply, and would be sufficient to obtain rigorous asymptotic 
bounds, but would not contain recipes for how to attain them. At present 
there are some situations (involving degeneracy) where stong local asymp- 
totic normality is conjectured but not yet proven. It would be interesting 
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to study these analytically tricky problems first using the simpler tools of 
weak Q-LAN. 

4.1 Convergence of classical statistical models 

To facilitate the comparison between classical and quantum, we will start 
with a brief summary of some basic notions from the classical theory of con- 
vergence of statistical models, specialised to the case of dominated models. 

Recall that if Pq is a probability distribution on (i^, S) with 9 £ Q 
unknown, then model V = {Pq : 9 & Q} is called dominated if <C P for 
some measure P. We will denote by pe the probability density of Pe with 
respect to P. Similarly, let V := {P'g : 6* G G} be another model on (O', S') 
with densities p'g = dP'g/dP'. Then we say that V and V' are statistically 
equivalent (denoted V ^ V) if their distributions can be transformed into 
each other via randomisations, i.e., if there exists a linear transformation 

R : L^{n, S, P) ^ L^{n', E', P') 

mapping probability densities into probability densities, such that for all 
9ee 

R{pe) =p'b, 

and similarly in the opposite direction. In particular, : $7 — )■ fi' is a 
sufficient statistic for V if and only if P ~ "P' where P'^ := Pq o S~^. 

In asymptotics one often needs to show that a sequence of models con- 
verges to a limit model without being statistically equivalent to it at any 
point. This can be formulated by using LcCam's notion of deficiency and 
the associated distance on the space of statistical models. The deficiency of 
V with respect to V' (expressed here in L} rather than total variation norm) 
is 

5{V,V') ■.= inisMv\\R{P6)-p'eh 
^ eee 

where the infimum is taken over all randomisations R. The LeCam distance 
between V and V is defined as 

A{P,P') := ma^{S{V,P'),S{V',V)), 

and is equal to zero if and only if the models are equivalent. A sequence of 
models P^"^ converges strongly to V if 

lim A(r^''\r) =0. 

This can be used to prove the convergence of optimal procedures and risks 
for statistical decision problems. We illustrate this with the example of local 
asymptotic normality (LAN) for i.i.d. parametric models, whose quantum 
extension provides an alternative route to optimal estimation in quantum 
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statistics. Suppose that P is a model over an open set C IR and that pe 

1/2 . 

depends sufficiently smoothly on 9 (e.g., is differentiable in quadratic 
mean), and consider the local i.i.d. models around with local parameter 

^^"^ ■■= {P^o+Vv/H ■■ M < c}. 

LAN means that P^") converges strongly to the Gaussian shift model con- 
sisting of a single sample from an fc-variate normal distribution with mean h 
and variance equal to the inverse Fisher information matrix of the original 
model at Oq 

J\f:={Nih,I,^'):\\h\\<c]. 

4.2 Convergence of quantum statistical models 

As we have seen, an important problem in quantum statistics is to find the 
most informative measurement for a given quantum statistical model and a 
given decision problem. A partial solution to this problem is provided by 
the quantum Cramer-Rao theory which aims to construct lower bounds to 
the quadratic risk of any estimator, expressed solely in terms of the proper- 
ties of the quantum states. Classical mathematical statistics suggests that 
rather than searching for optimal decisions, more insight could be gained 
by analysing the structure of the quantum statistical models themselves, 
beyond the notion of quantum Fisher information. Therefore we will start 
by addressing a more basic question of how to decide whether two quantum 
models over a parameter space are statistically equivalent, or close to each 
other in a statistical sense. To answer this question we will introduce the 
notion of quantum channel, which is a transformation of quantum states 
that could - in principle - be physically implemented in a lab, and should 
be seen as the analog of a classical randomisation which defines a particular 
data processing procedure. The simplest example of such transformation is 
a unitary channel which rotates a state {d x d density matrix p) by means 
of a d X d unitary matrix U, i.e., 

U : p^UpU*. 

Since lA can be reversed by applying the inverse unitary U~^, we anticipate 
that it will map any quantum model into an equivalent one. More gener- 
ally, a quantum channel C : M(C'^) — > M(C^) must satisfy the minimal 
requirement of being positive and trace preserving linear map, i.e., it must 
transform quantum states into quantum states in an affine way, similarly to 
the action of a classical randomisation. However, unlike the classical case, 
it turns out that this condition needs to be strengthened to the requirement 
that C is completely positive, i.e., the amplified maps 

C Id„ : M(C'^) M(C") ^ M(C'^) ® M(C") 
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must be positive for all n > 0, where Id^ is the identity transformation on 
M(C"). An example of a positive but not completely positive, and hence 
unphysical transformation, is the transposition tr : M{C^) — )• M{C'^) with 
respect to a given basis. Indeed, the reader can verify that applying tr 
Idd to any pure entangled state ( i.e., not a product state \ip){ip\ (8) 
produces a matrix which is not positive, hence not a state. 

Definition 1. A linear map C : M(C'^) — M{£,^) which is completely 
positive and trace preserving is called a quantum channel. 



The Stinespring-Kraus Theorem iNielsen and Chuane says a linear 



map C : M{C'^) — )• M(C^) is completely positive map if and only if it is of 
the form 

dk 

C{p) = Y,K,pK*, 

i=l 

with Ki linear transformations from £.'^ to C'^, some of which may be equal 
to zero. Moreover, C is trace preserving if and only if '^iK*Ki = \d- In 
particular, if the sum consists of a single non-zero term VpV* , the action of 
the channel C is to embed the state p isometrically into a the d-dimensional 
subspace Kan{V) C C'^. As in the unitary case, it is easy to see that this 
action is reversible (hence noiseless) and maps any statistical model into an 
equivalent one. We are now ready to define the notion of equivalence of 
statistical models, as an extension of the classical characterisation. 

Definition 2. Let Q := {p{0) G M(C"') : 6* G 6} and 7^ := {ip{0) G M(C'=) : 
G 0} be two quantum statistical models over Q. Then Q is statistically 
equivalent to TZ if there exist quantum channels T : M(C'^) — )• M{C^) and 
S : M{£^) M(C^) such that for all 6 e Q 

T{p{6)) = ip{6) and S{m) = Pid)- 

The interpretation of this definition is immediate. Suppose that we want 
to solve a statistical decision problem concerning the model TZ, e.g., esti- 
mating 9, and we perform a measurement M on the state ipe whose out- 
come is the estimator 9 with distribution P^^ = M{p{9)) and risk Rg^ := 
Ee{d{9,9)'^). Consider now the same problem for the model Q, and define 
the measurement N = M o R realised by first mapping the quantum states 
p{9) through the channel T into 'p{9), and then performing the measure- 
ment M. Clearly, the distribution of the obtained outcome is again P^^ 
and the risk is -Rg^, so we can say that Q is at least as informative as V 
from a statistical point of view. By repeating the argument in the oppo- 
site direction we conclude that any statistical decision problem is equally 
difficult for the two models, and hence they are equivalent in this sense. 
However, unlike the classical case the opposite implication is not true. For 
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instance, models whose states are each other's transpose have the same set 
of risks for any decision problem but are usually not equivalent in the sense 
of being connected by quantum channels. It turns out that a full statistical 
interpretation of Definition [2] is possible if one considers a larger set of quan- 
tum decision problems, which do not involve measurements, but quantum 
channels as statistical procedures. 

Until this point we have tacitly assumed that any (finite dimensional) 
quantum model is built upon the algebra of square matrices of a certain di- 
mension. However this setting is too restrictive as it excludes the possibility 
of considering hybrid classical-quantum models, as well as the development 
of a theory of quantum sufficiency. We motivate this extension through the 
following example. We throw a coin whose outcome X has probabilities 
Po{l) = and pg{0) = 1 — 0, and subsequently we prepare a quantum sys- 
tem in the state pe{X) G M(C'^) which depends on X and the parameter 
6. What is the corresponding statistical model ? Since the "data" is both 
classical and quantum, the "state" is a matrix valued density on {0, 1} 

Qe{i) = Ve{i)pe{i), i G {0, 1} 

or equivalently, a block-diagonal density matrix Q0{1) ® Qe{'^) ^ M(C'^) © 
M(C'^) which is positive and normalised in the usual sense. While this 
can be seen as a state on the full matrix algebra M(C^'^), it is clear that 
since the off-diagonal blocks have expectation zero for all 6, we can restrict 
Qe to the block diagonal sub-algebra M(C'^) ® M(C°') without loosing any 
statistical information. In other words, the latter is a sufficient algebra 
of our quantum statistical model. In general, for a model defined on some 
matrix algebra, one can ask what is the smallest sub-algebra to which we can 
restrict without loosing statistical information, i.e., such that the restricted 
model is equivalent to the original one in the s ense of definition Bl The 
theory of quantum sufficiency was developed in iPetz and Jencova 
where a number of classical results were extended to the quantum set-up, 
in particular the fact that the minimal sufficient algebra is generated by the 
likelihood ratio statistic. 

We now make a step further and characterise the "closeness" rather than 
equivalence of quantum statistical models, by generalising LeCam's notion 
of deficiency between models. 

Definition 3. Let Q := {p{e) G M(C"') -.6 £0} andU := {(^(6*) G M(C'=) : 
9 G &} be two quantum statistical models over Q. The deficiency oflZ with 
respect to Q is defined as 

6{n,Q) = mfsnpMO) -np{9))\\i (18) 

where the infimum is taken over all channels T : M(C'^) ^ M(C^). The 
LeCam distance between Q and TZ is 

A(Q, n) = max (5(7^, Q), 5(Q, 7^)) . 
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This is an extension of the classical definition of deficiency for dominated 
statistical models. We will use the LeCam distance to formulate the concept 
of local asymptotic normality for quantum states and find asymptotically 
optimal measurement procedures. 

4.3 Continuous variables systems and quantum Gaussian states 

In this section we introduce the basic concepts associated to continuous 
variables (cv) quantum systems, and then analyse the problem of optimal 
estimation for simple quantum Gaussian shifts models. 

Firstly we will restrict our attention to the elementary "building block" 
cv system which physically may be a particle moving on the real line, or 
a mono-chromatic light pulse. Then we will show how more complex cv 
systems can be reduced to a tensor product of such "building blocks" by a 
standard "diagonalisation" procedure. 

The Hilbert space of the system is ^ = L^(R) and its quantum states 
are given by density matrices, i.e., positive operators of trace one. Unlike 
the finite dimensional case, their linear span, called the space of trace-class 
operators Ti{Ti), is a proper subspace of all bounded operators on Ti, which 
is a Banach space with respect to the trace-norm 

oo 

||r||i :=Tr(|r|)= J^Si, 
1=1 

where Si are the singular values of r. The key observables are two "canon- 
ical coordinates" Q and P representing the position and momentum of the 
particle, or the electric and magnetic field of the light pluse, and are defined 
as follows 

(Q/)(x) = xf{x), (P/)(x) = (19) 

Although they do not commute with each other, they satisfy Heisenberg's 
commutation relation which essentially captures the entire algebraic prop- 
erties of the system: 

QP-PQ = a. 

The label "continuous variables" stems from the fact that the probability 
distributions of Q and P are always absolutely continuous with respect to 
the Lebesgue measure. Indeed since any state is a mixture of pure states, 
it suffices to prove this for a pure state \tp){tp\. If Q and P denote the 
real valued random variables representing the outcomes of measuring Q and 
respectively P then using ()19p one can verify that 

E(e''^«) = {iP,e'''^iP) = J e''"^\iP{q)\^dq, 

E(e*^^) = {ip,e'''^tp) = J e'''P\i^{p)fdp. 
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where tp is the Fourier transform of ■0. This means that Q and P have 
probabiUty densities |V'(fJ')P and respectively |V'(p)p5 and suggests that the 
cv system should be seen as the non-commutative analogue of an val- 
ued random variable. Following up on this idea we define the "quantum 
characteristic function" of a state p 

Wp{u,v) := Tr (^pe-^("Q+^P)) 

and the Wigner or "quasidistribution" function 

These functions have a number of interesting and useful properties, which 
make them into important tools in visualising and analysing states of cv 
quantum systems. 

1. there is a one-to-one correspondence between p and Wp-, 

2. the Wigner function may take negative values, but its marginal along 
any direction is a bona-fide probability density corresponding to the 
measurement of the quadrature observable := Q cos -|- P sin 0; 

3. Both Wp and Wp belong to L^(IR^) and the following isometry holds 
between the space of Hilbert-Schmidt operators T2{L^{R)) and L^(IR^) 



Ti{pA)= J J Wp{q,p)WA{q,p)dqdp. 



We can now introduce the class of quantum Gaussian states by analogy 
to the classical definition. 

Definition 4. Let p be a state with mean {q,p) = (Tr(pQ), Tr(pQ)) and 
covariance matrix 



V 



Tr(p(Q-(7)2) Tr{p{Q-q)o{P-p)) 

Tr{p{Q-q)o{P-p)) Tr {p{P - pf) 

Then p is called Gaussian if its characteristic function is 

in particular the Wigner function Wp is equal to the probability density of 
N{x,V). 
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While the definition looks deceptively similar to that of a classical normal 
distribution, there are a couple of important differences. The first one is that 
the covariance matrix V cannot be arbitrary but must satisfy the uncertainty 
principle 

Dct(F) > (20) 

This restriction can be traced back to the commutation relations [Q, P] = il 
which says that we cannon assign classical values to Q and P simultaneously. 
Which leads us to the second point, and the problem of optimal estimation: 
since Q and P cannot be measured simultaneously, their covariance matrix 
V is not "achievable" by any measurement aimed at estimating the means 
{q,p) and the experimenter needs to make a trade-off between measuring 
Q with high accuracy but ignoring P, and vice- versa. In the last part of 
this section we look at this problem in more detail and explain the optimal 
measurement procedure. 

Definition 5. A quantum Gaussian shift model is family of Gaussian states 

g := {^{x,V) : X G R^} 

with unknown mean x and fixed and known covariance matrix V. If G is a 
2x2 positive real weight matrix, the optimal estimation problem is to find 
the measurement M with outcome x = {q, p) which minimises the maximum 
quadratic risk 

R{M) = sup ((x - x)G{x - xf) . (21) 

This is a provisional definition only: a definitive version follows as Defi- 
nition 6 below. Finding the optimal measurement, relies on the equivariance 
(or covariance in physics terminology) of the problem with respect to the 
action of the translations (or displacements) group on the states 

V{y):^{x,V)^^{x + y,V), y G K^. 

This action is implemented by a unitary channel 

$(x + y,V) = D{y)^{x, V)D{y)\ y = {u, v) 

where D{y) = exp(if Q— iuP) are called the displacement or Weyl operators. 
Since R(M) is invariant under the transformation [x,x] i— )■ [x + y,x + y], a 
standard equivariance argument shows that the infimum risk is achieved on 
the special subset of covariant measurements, defined by the property 

Such measurements, and the more general class of covariant qTiantum chan- 
nels, have a simple description in terms of linear transformation on the space 
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of coo rdinates of the system together with an auxihary system, iNachtergaele et al 



(|201lh . More specifically, consider an independent quantum cv system with 



coordinates (Q', P'), prepared in a state r with zero mean and covariace ma- 
trix Y. By the commutation relations, the observables Q + Q' and P — P' 
commute with each other and hence can be measured simultaneously. Since 
the joint state of the two independent systems is <&(x, V) (8> r, the outcome 
{q,p) of the measurement is an unbiased estimator of {q,p) with covariance 
matrix V + Y, and the risk is 

R{M) = Tv{G{V + ¥)) = Tr{GV) + Tr{GY) 

where the first term is the risk of the corresponding classical problem, and 
the second is the non-vanishing contribution due to the auxiliary "noisy" 
system. To find the optimum, it remains to minimise the above expression 
over all possible covariance matrices of the auxiliary system which must 
satisfy the constraint Det(y) > 1/4. If G has the form G = O Diag(5i, (72) O* 
with O orthogonal, then it can be easily verified that the optimal Y is the 
matrix 

In ( 92/91 



Moreover, the unique state with such "minimum uncertainty" is the Gaus- 
sian state r = <1'(0, Yq). In conclusion, the minimax risk is 



Rminmax = mf R{M) = Tr{GV) + v/Det(G). 

A/ 

4.4 General Gaussian shift models and optimal estimation 

We now extend the findings of the previous section from the "building block" 
system to a multidimensional setting. In essence, we show that the Holevo 
bound is achievable for general Gaussian shift models, a result which has 
been known - in various degrees of generality - since the pioneering work of 
V.P. Belavkin and of A.S. Holevo in the 70's. 

Let us consider a system composed of p > 1 mutually commuting pairs 
of canonical coordinates (Qi,Pj), so that the commutation relations hold 

[Qi,Pj] = i6ijl, i,j = l,...,p. 

The joint system can be represented on the Hilbert space L^(R)®^ such 
that the pair (Qj,Pj) acts on i-th copy of the tensor product as in p^ . 
and as identity on the other spaces. Additionally, we allow for a number 
I of "classical variables" which commute with each other and with all 
(Qj,Pi), and can be represented separately as position observables on k 
additional copies of L^(R). For simplicity we will denote all variables as 

(Xi, . . . , Xm) = (Qi, Pi, . . . , Qp, Pp, Ci, . . . , Ci), m = 2p + l, 
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and write their commutation relations as 

[Xj, Xj] = iSijl, 

where S is the m x m block diagonal symplectic matrix of the form S = 
Diag(Jl,...,J^,0, ...0) with 




Note that while this may seem to be rather special cv system, it actually 
captures the general situation since any symplectic (bilinear antisymmetric) 
can be transformed into the above one by a change of basis. 

The states of this hybrid quantum-classical system are described by pos- 
itive normalised densities in 7i{L'^{R^)) L^([R^), e.g., if the quantum and 
classical variables are independent the state is of the form p0p with p a den- 
sity matrix and p a probability density. In general the classical and quantum 
parts may be correlated, and the state is a positive operator valued density 
: R' — )• 7i(L^([R^)), whose characteristic function can be computed as 

Eg (e*^^i = Tr (£'(y)e^£i e'^'j-^ ^ ^ ^ ^y^_ 

Definition 6. A state ^{x,V) with mean x S IR™ and m x m covariance 
matrix V is Gaussian if 

A Gaussian shift model over the parameter space := R'^ is a family 

g := {^{Lh,V) -.heU^] 
where L : R'^ — )• R™ is a linear map. 

Note that the dimension of the parameter h may be smaller than the 
dimension of mean value x. One may distinguish full and partial quantum 
Gaussian shift models: in the full model case, the dimensions are equal (and 
the matrix L invertible). A non-classical feature of the general quantum 
Gaussian shift is that a linear submodel of a full Gaussian shift model is not, 
in general, equivalent to a full model with lower- dimensional mean vector. 

The analogue of the uncertainty principle (|20p for general cv systems is 
the (complex) matrix inequality 

V > '-S. (22) 

The statistical decision problem is to find the measurement which opti- 
mally estimates the parameter h of the Gaussian state ^{Lh, V), for a mean 
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square error risk with a given k x k weight matrix G, cf. ()2ip . As before, 
we can restrict our attention to covariant measurements, i.e., to measuring 
mutually commuting variables of the form 



where 



and 



m 

Y» = ^yfx„ E,(Y»)=0. 

i=i 

Here (Xi, . . . ,Xm) are the coordinates of an independent, auxihary system 
with symplectic matrix S, prepared in a state g with mean zero and covari- 
ance matrix V. Let V^^^ and V^^^ denote the covariance matrices of the 
independent systems (Y*^^), . . . , Y*^'^')) and (Y*^^\ . . . , Y*^^)). Then the risk of 
the (W(i),...,W('=)) measurement is 

R{W) = Tr(Gy(^)) + Tr(Gy(^)). 

On the other hand, since all W^*) must commute with each other, we have 

[yW,y(^')] = -[yW, Y(^')] := -iS^^h. 

The uncertainty principle ()22p applied to to the auxiliary variables Y^*^ gives 
the constraint 

- 2 

Lemma 1. Let V and S be real symmetric and respectively anti- symmetric 
k X k matrices, such that V > iS/2. Then TtiV) > Tr(|S'|)/2, with equality 
forV = \S\/2. 

By optimising y^^^'s contribution to the risk and applying the above 
lemma with a fixed choice of Y*^*) we obtain 

inf Tr(Gy(^)) = inf Ti{^v'^^^VG) = -Tr{VG S'-^^ VG). 
y(») y(*) 2 

and the infimum is achieved for the covariance matrix 

y{Y) ^ |5(Y)|/2^ 

which is only po ssible if the auxi liary system is prepared in the Gaussian 
state $(0, iLeonhardI \imi ). 

It remains now to optimise the risk over all unbiased (Y^^^, . . . , Y^'^)) i.e., 
which satisfy the condition ([8]) from the formulation of the Holevo bound: 



dh 



:^^u,v = (23) 
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The minimax risk is then 



minmax \ 



■XQ,G)= inf Tr(VGV^^^VG)+lTr(VG S^^) VG 



{Y(')} \ / 2 

which is equal to the Holevo bound ([9]) if we consider that 

y,^ = KE$(oy)(Y«Y(^-)), and ^5^^) = 9 E^(o,y)(Y«Y(^-)). 



4.5 Local asymptotic normality for i.i.d. states 

In this section we show how the general Gaussian shift models discussed 
above emerge from i.i.d. models through local asymptotic normality. 

Suppose that we are given independent quantum systems prepared 
identically in an unknown state p G M(C'^). For large N we can sacrifice a 
small part of the systems (e.g., N = N^^'') and use them to construct an 
estimator pQ of the state, by means of a quantum tomography procedure. 
Using standard concentration inequalities it can be shown that p belongs to a 
neighbourhood of size A^^^/2+^ centred at pQ, with probability converging to 
one. Therefore, the asymptotic behaviour of parameter estimation problems 
is determined by the structure of local quantum models around a fixed state 
po, and from now on we will restrict our attention to such models. By 
choosing the eigenvectors of po as the standard basis, and assuming that the 
eigenvalues satisfy pi > . . . pd > 0, we have po = Diag(/Ui, . . . , pd) and an 
arbitrary state in its neighbourhood is of the form 



Ph 



Pi + Ui Cl,2 
Cl,2 P2 + U2 

Cl,d 



Ui e K, Cj,k G C. 



with local parameter h = (-u, C) G R"*"^ x C^^"*'^)/^ 
quantum model around po is then defined as 



Q 



N 



P 



h/y/N 



(24) 

. The local i.i.d. 



(25) 



If some eigenvalues pi are equal to one another or to zero, degeneracies 
occur which are tricky to deal with. Completing the theory for such situ- 
ations is a topic of ongoing research. In the rest of this section we give an 
intuitive argument for the emergence of the limit Gaussian model and finish 
with the precise formulation of LAN, restricting attention to the nondegen- 
erate situation. 

We define m = d? — 1 operators whose expectation with respect to the 
state Po is zero, and together with the identity form a basis of of the space 
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of self adjoint d x d matrices 



{Xi, . . . ,Xm} — {Ql,2, -Pi, 2, • • • , Qd-l,d, Pd-l,d, Ci, . . . , Crf_i}, 

where 

^ _\jm+jm p ami - im) ^ ...w., . 

Let Qj^k{X) G M{C^)^^ denote the corresponding collective observables 



N 



s=l 

(s) 

with Qj ^ acting on the position s of the tensor product; similar definitions 
hold for Pj^k{N), Ci{N). The collective observables play the role of sufficient 
statistic for our i.i.d. model, and we would like to understand their asymp- 
totic behaviour. Since all systems are independent and identically prepared, 
and the terms in each collective observable commute, we can apply classical 
Central Limit techniques to show that, under the state p^, we have 



Qj,k{N) 



Pj,k{N) 



N {ui,fj,i{l - m)) , l<i<d-l; 
N (^^Cj,k, Vj,k) l<j<k<d- 



N 



N I ^Cj,k, Vj,k ) , I < j < k < d, 



where Cj,k = 0,fc/\/(/^j - /"fc)/2 and vj^k = l/(2(/Xj - /Xfc)). This indicates 
that the model converges to a Gaussian shift model, but does not tell us what 
the covariance and commutation relations of the different limit variables are. 
For this, we need a quantum CLT, that is a multivariate CLT which takes 
into account the fact that the collective variable s do not commu t e with each 
other. Its precise formulation can be found in lOhva and plt3 ^2004 ). but 
for our purposes it is enough to give the following recipe. The limit is a 
general cv system as described in section [531 with m = d^ — 1 coordinates 
(Xi, . . . , Xm) = (Qj,fc) Pj,fc) Cj) having the commutation relations 

[X„X,] = TT{po[Xa,X,])l = 2i'^TT{poXaXh)l, 

whose state is Gaussian with covariance matrix 

Va,b = TlipoiXaXb + XbXa)/2) = mT{poXaXb)l. 

It can be easily verified that thanks to our special choice of basis, (Qj,fci ^j,k) 
are pairs of position and momentum operators, which commute with all 
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other coordinates and C,- are "classical" variables, cf. section 



Moreover 



the covariance matrix is block diagonal, with each pair (Qj,A:) having a 
2x2 the covariance matrix V^/^ = and no correlation with the other 

coordinates, and the classical variables have covariance matrix 



l,...d-l. 



In summary, the limit Gaussian model consists of a tensor product between a 
Gaussian probability density and a density matrix of d{d—l)/2 independent 
quantum Gaussian states 

Gih, fi) := Miu, V') (g) $ ,,), V;^,) . (26) 

j<k 

We can now formulate the LAN Theorem which shows that the i.i.d. 
model can be asymptotically approximated by the Gaussian one, by means 
of quantum-classical randomisations, as discussed in section 14.21 An alter- 
native approach based on a generalisation of the noti on of weak convergence 
of models, can be found in iGut^a and Jenc^oval ((20071). 



Theorem 3. Let Qn he the i.i.d. quantum model ()25p and let 

Gn ■={G{h,fi) : \\h\\<N'}. 



be the Gaussian model with G{h, fi) defined in (j26p . Then there exist chan- 
nels (completely positive, normalised maps) 



Tn 
Sn 



such that 



0, 



lim A{Qn,Gn) 



where A(-,-) is the LeCam distance, cf. Definitions^ 

Clearly, in the same i.i.d. setting, smooth lower-dimensional submodels 
of the model of a completely unknown state converge to a partial Gaussian 
shift model. 



4.6 Asymptotic attainability of the Holevo bound 

Besides its theoretical importance, local asymptotic normality has been used 
as a tool for solving various a s ympt otic problems such as optim al quantum 
learni ng Gu^^a and Kotlowski ( 20ld ) , teleportation benc hmarks Gu1;;a et al 



( 20ld ). quantum state purification Bowles et al. ( 201ll ). Here we give a 
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short non-technical argument for the asymptotic attainabihty of the Holevo 
bound for i.i.d. models, using local asymptotic normality. 

In section S3] we showed that the Holevo bound is attained for arbitrary 
classical-quantum Gaussian shift models. We then saw that the model of A'^ 
i.i.d. systems prepared in a completely unknown state converges locally to 
a Gaussian shift model with {cP — 1) parameters. If some prior information 
about the state of the systems is available, we consider a lower dimensional 
model pg e M(C'^) with 6* G 6 C IR^. By applying LAN to this sub- model 
of the "full" one, we find that it is approximated in the LeCam sense by a 
Gaussian shift of the form 

g' = {G{Lh',fi) : h' £ R^} 

where L : ^ R'^ is a linear map which depends only on the local 
behaviour of the restricted model around 6q. To identify the linear transfor- 
mation L we recall the correspondence between the collective variables and 
the limit continuous variables 

{Lh')a := ^Gih',,){^a) = IhTi Tr{p^,Xa{N)) =Y^h',Tr(^ X^) 



from which we deduce 



Li a = Tr 



dph' 



dh' 



Xa 

h=0 



By a technical but otherwise rather standard argument, one can show 
that the asymptotic minimax risk for the problem of estimating the local 
parameter h' converges to the minimax risk for the same problem and the 
Gaussian model Q' , where in both cases the loss function is quadratic with 
weight matrix G 

lim inf sup NR{MN,h') = Rminmax{Q' ,G). 
N~^oo Mm ||h'||<7V^ 

The final step in proving the asymptotic attainability of the Holevo 
bound for finite dimensional systems it is to observe that its expression co- 
incides with that of the minimax risk deduced in section 14. 4^ applied to the 
Gaussian shift model G' . The optimisation ([9]) is performed over selfadjoint 
matrices satisfying the condition ([5D, which becomes (|23p when translated 
into the cv language. Similarly, the real and imaginary parts of Z{X) be- 
come the covariance and the symplectic matrices and respectively 5^/2. 
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Appendix: examples 



In the three examples discussed here, the loss function is derived from 
a very popular (among the physicists) figure-of-merit in state estimation 
called fidelity. Suppose we wish to estimate a state p = p{9) by p = p{0). 
Fidelity measures the closeness of the two states, being maximally equal 
to 1 when the estimate and truth coincide. It is defined as Fid(/J, p) = 



(trace(Y p2pp2)^^ (some authors would call this squared fidelity). When 

both states are pure, thus p = \4>){4>\ and p = \4>){<t>\ where (j) and 4> 
unit vectors in C^, then ¥\d{ ct),(i)) = \{(j)\6 )\'^. There is an important char- 
acterization of fidelity due to Fuchd ( 1995 ) which both explains its meaning 



and leads to many important properties. Suppose M is a measurement on 
the quantum system. Denote by M{p) the probability distribution of the 
outcome of the measurement M when applied to a state p. For two prob- 
ability distributions P, P on the same sample space, let p and p be their 
densities with respect to a dominating measure fi and define the fidelity 
between these probability measures as Fid(P, P) = (/ p^p^dpj . In usual 
statistical language, this is the squared Hellinger affinity between the two 
probability measures. It turns out that Fid{p, p) = infjv/ Fid(M(p), M(p)), 
thus two states have small fidelity when there is a measurement which dis- 
tinguishes them well, in the sense that the Hellinger affinity between the 
outcome distributions is small, or in other words, the L2 distance between 
the root densities of the data under the two models is large. 

Now suppose states are smoothly parametrized by a vector parameter 
9. Consider the fidelity between two states with close- by parameter values 
9 and 9, and suppose they are measured with the same measurement M. 
From the relation J p^p^dp = 1 — ^\\p^ ~P^\\'^ and by a Taylor expansion to 
second order one finds 1 - Fid(P,P) ^1(9- 9)^ Im{9){9 - 9) where Im{9) 
is the Fisher information in the outcome of the measurement M on the state 
p(9). We will define the Helstrom quantum information matrix H{9) by the 
analogous relation 

1 - Fid(p,p) « \{9- 9)^H{9){9- 9). (27) 

It turns out that H{9) is the smallest "information matrix" such that Im{9) < 
H{9) for all measurements M. 

Taking as loss function l{9, 9) = 1 — F[d{p{9), p{9)) we would expect (by 
a quadratic approximation to the loss) that E-^^Qij^ is a sharp asymptotic 
lower bound on N times the Bayes risk. We will prove this result for a 
number of special cases, in which by a fortuitous circumstance, the fidelity- 
loss function is exactly quadratic in a (sometimes rather strange) function of 
the parameter. The first two example s concern a two-dime nsional quantum 
system and are treated in depth in iBagan et al. (|2006al ^: below we just 
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outline some important features of the application. In the second of those 
two examples our asymptotic lower bound is an essential part of a proof of 
asymptotic optimality of a certain measurement-and-estimation scheme. 

The third example concerns an unknown pure state of arbitrary dimen- 
sion. Here we are pre sent a sh o rt an d geometric proof of a surprising but 
little known result of Havashi ( 19981 ) which shows that an extraordinar- 



ily simple measurement scheme leads to an asymptotically optimal estima- 
tor (providing the data is processed efficiently). The anal ysis also links 
the previously u i iconii ected Holevo and Gill-Massar bounds ( Holevol . 1982 : 
Gill and Ma^ . bnod l. 



Example 1: Completely unknown spin half {d=2, p=3) 

Recall that a completely unknown 2-dimensional quantum state can be writ- 
ten p{9) = ^(1 + 9iai + 02(72 + ^sfs), where 6 lies in the unit ball in IR^. 
It turns out that Fid(^,p) = i(l + ^- 6* + (1 - )^(1 - \\0\\^)^). Define 
TpiG) to be the four-dimensional vector obtained by adjoining (1 — ||0|p)2 
to 9i, 02, 03- Note that this vector has constant length 1. It follows that 
1 — Fid{f), p) = \\\ip — V'lP- This is a quadratic loss-function for estimation 
of ip{9) with G = 1, the 4x4 identity matrix. By Taylor expansion of both 
sides, we find that \H = ij)''^ Gip' = G and conclude from Theorem 1 that N 
times 1— mean fidelity is indeed asymptotically lower bounded by 

In Bagan. Ballester. Gill. Monras and Mufioz-Tapia ( 2006al ) the exactly 



optimal measurement-and-estimation scheme is derived and analysed in the 
case of a rotationally invariant prior distribution over the unit ball. The 
optimal measurement turns out not to depend on the (arbitrary) radial 
part of the prior distribution, and separates into two parts, one used for 
estimating the direction the other part for estimating the length 

\\0\\. The Bayes optimal estimator of the length of naturally depends 
on the prior. Because of these simplifications it is feasible to compute the 
asymptotic value of N times the (optimal) Bayes mean fidelity, and this 
value is (3 + 2E^||6'||)/4. 

The Helstrom quantum information matrix H and the Holevo lower 
bound Qi^ are also computed. It turns out that Qi^{0) = (3 + 2||0||)/4. 
Our asymptotic lower bound is not only correct but also, as expected, sharp. 

The van Trees approach does put some non-trivial conditions on the prior 
density vr. The most restrictive conditions are that the density is zero at the 
boundary of its support and that the quantity (fT6]) be finite. Within the unit 
ball everything is smooth, but there are some singularities at the boundary 
of the ball. So our main theorem does not apply directly to many priors 
of interest. However there is an easy approximation argument to extend its 
scope, as follows. 

Suppose we start with a prior vr supported by the whole unit ball which 



30 



does not satisfy the conditions. For any e > construct vr = yf^ which is 
smaher than (1 + e)7r everywhere, and for \\9\\ > 1 — 5 for some 6 > 0. 
If the original prior tt is smooth enough we can arrange that vr satisfies the 
conditions of the van Trees inequahty, and makes (jl6p finite. N times the 
Bayes risk for n cannot exceed 1 + e times that for vr, and the same must 
also be true for their limits. Finally, E^^Sij:^ — t- E^Si^:^ as e — t- 0. 

Some last remarks on this example: first of all, it is known that only 
collective measurements can asymptotically achieve this bound. Separate 
measurements on separate systems lead to strictly worse estimators. In fact, 
by the same methods one can obtain the sharp asymptotic lower bound 9/4 
(independent of the prior), see Bagan, Ballester, Gill, Muhoz-Tapia and 
Romero-Isart (2006b), when one allows the measurement on the nth system 
to depend on the data obtained from the earlier ones. I r istead of the Holevo 
bound itself, we use here a bound of Gill and Massar (|2000l ). which is ac- 



tually has the form of a dual Holevo bound. (We give some more remarks 
on this at the end of the discussion of the third example). Secondly, our 
result gives strong heuristic su pport to the claim that the measurement-and- 



estimat ion scheme developed in lBagan. Ballester. Gill. Monras and Muhoz-Tapia 



(|2006al ) for a specific prior and specific loss function is also pointwise op- 
timal in a minimax sense, or among regular estimators, for loss functions 
which are locally equivalent to fidelity-loss; and also asymptotically optimal 
in the Bayes sense for other priors and locally equivalent loss functions. In 
general, if the physicists' approach is successful in the sense of generating a 
measurement-and-estimation scheme which can be analytically studied and 
experimentally implemented, then this scheme will have (for large A^) good 
properties independent of the prior and only dependent on local properties 
of the loss. 

Example 2: Spin half: equatorial plane {d=2, p=2) 

Bagan. Ballester. Gill. Monras and Muhoz-Tapial ( 2006a ) also considered the 



case where it is known that ^3 = 0, thus we now have a two-dimensional 
parameter. The prior is again taken to be rotationally symmetric. The 
exactly Bayes optimal measurement turns out (at least, for some and 
for some priors) to depend on the radial part of the prior. Analysis of the 
exactly optimal measurement-and-estimation procedure is not feasible since 
we do not know if this phenomenon persists for all A^. However there is a 
natural measurement, which is exactly optimal for some A^ and some pri- 
ors, which one might conjecture to be asymptotically optimal for all priors. 
This sub-optimal measurement, combined with the Bayes optimal estima- 
tor given the measurement, can be analysed and it turns out that A^ times 
1— mean fidelity converges to 1/2 as A^ — >• 00, independently of the prior. 
Again, the Helstrom quantum information matrix H and the Holevo lower 
bound Cijj are computed. It turns out that Cifj{9) = 1/2. This time we 
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can use our asymptotic lower bound to prove that the natural sub-optimal 
measurement-and-estimator is in fact asymptotically optimal for this prob- 
lem. 

For a p-parameter model the best one could every hope for is that for 
large there are measurements with Im approaching the Helstrom upper 
bound H. Using this bound in the van Trees inequality gives the asymptotic 
lower bound on N times 1— mean fidelity of p/4. The example here is a 
special case where this is attainable. Such a model is called quasi- classical. 

If one restricts attention to separate measurements on separate systems 
the sharp asymptotic lower bound is 1, twice as large, see Bagan, Ballester, 
Gill, Mufioz-Tapia and Romero-Isart (2006b). 



Example 3: Completely unknown d dimensional pure state 

In this example we make use of the dual Holevo bound and symmetry ar- 
guments to show that in this example, the original Holevo bound for a nat- 
ural choice of G (corresponding to fidelity-loss) is attained by an extremely 
large class of measurements, including one of the most basic measurements 
around, known as "standard tomography". 

For a pure state p = |</>)(</>|, fidelity can be written where \(t)) G 

is a vector of unit length. The state-vector can be multiplied by e*" 
for an arbitrary real phase a without changing the density matrix. The 
constraint of unit length and the arbitrariness of the phase means that one 
can parametrize the density matrix p corresponding to |0) by 2((i — 1) real 
parameters which we take to be our underlying vector parameter 9 (we have 
d real parts and d imaginary parts of the elements of \4>), but one constraint 
and one parameter which can be fixed arbitrarily). 

For a pure state, p^ = p so trace(p^) = 1. Another way to write the 
fidelity in this case is as trace(/)p) = Yliji^iPij)^iPij) + ^iPij)^iPij))- So 
if we take '4'i9) to be the vector of length 2d'^ and of length 1 containing the 
real and the imaginary parts of elements of p we see that 1 — Fid(p, p) = 
^IIV' — V'lP- It follows that 1— fidelity is a quadratic loss function in V'(^) 
with again G = 1. 

Define again the Helstrom quantum information matrix H(9) for 9 by 
1 — Fid(p, p) \{9 — 9)~^ Im{&){& — ^)- Just as in the previous two examples 
we expect the asymptotic lower bound E^^Qij^ to hold for A'" times Bayes 

mean fidelity-loss, where G = jH = ip''^ Gip' . 

Some striking fac t s are known about estimation of a pure state. First of 
all, from lMatsumotol (|2002l '). we know that the Holevo bound is attainable, 



for all G, already at A'^ = 1. Secondly, from Gill and Massar ( 200d ) we have 
the following inequality 

traceiJ^^/M <d-l (28) 
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with equality (in the case that the state is completely unknown) for all ex- 
haustive measurements M^^^ on copies of the state. Exhaustivity means, 
for a measurement with discrete outcome space, that M^^\{x}) is a rank 
one matrix for each outcome x. The meaning of exhaustivity in general is 
by the same property for the density m{x) of the matrix-valued measure 
M^^^ with respect to a real dominating measure, e.g., trace(M(^)(-)). This 
tells us that ([25]) is one of the "dual Holevo inequalities" . We can associate 
it with an original Holevo inequality once we know an information matrix 
of a measurement attaining the bound. We will show that there is an in- 
formation matrix of the form Im = cH attaining the bound. Since the 
number of parameters (and dimension of H) is 2{d — 1) it follows by impos- 
ing equality in ()28p that c = ^. The corresponding Holevo inequality must 

be trace^H H~^^H I ]<^j > d — 1 which tells us that Qi^j = d — 1. 

The proof uses an invariance property of the model. For any unitary 
matrix U (i.e., UU* = U*U = 1) we can convert the pure state p into a new 
pure state UpU*. The unitary matrices form a group under multiplication. 
Consequently the group can be thought to act on the parameter 6 used 
to describe the pure state. Clearly the fidelity between two states (or the 
fidelity between their two parameters) is invariant when the same unitary 
acts on both states. This group action possesses the "homogenous two point 
property": for any two pairs of states such that the fidelities between the 
members of each pair are the same, there is a unitary transforming the first 
pair into the second pair. 

We illustrate this in the case d = 2 where (first example, section 2), the 
pure states can be represented by the surface of the unit ball in R^. It turns 
out that the action of the unitaries on the density matrices translates into 
the action of the group of orthogonal rotations on the unit sphere. Two 
points at equal distance on the sphere can be transformed by some rotation 
into any other two points at the same distance from one another; a constant 
distance between points on the sphere corresponds to a constant fidelity 
between the underlying states. 

In general, the pure states of dimension d can be identified with the Rie- 
mannian manifold CP'^~^ whose natural Riemannian metric corresponds 
locally to fidelity (locally, 1— fidelity is squared Riemannian distance) and 
whose isometries correspond to the unitaries. This space posseses the ho- 
mogenous two point property, as we argued above. It is easy to show that 
the only Riemannian metrics invariant under isometries on such a space are 
proportional to one another. Hence the quadratic forms generating those 
metrics with respect to a particular parametrization must also be propor- 
tional to one another. 

Consider a measurement whose outcome is actually an estimate of the 
state, and suppose that this measurement is covariant under the unitaries. 
This means that transforming the state by a unitary, doing the measurement 
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on the transformed state, and transforming the estimate back by the inverse 
of the same unitary, is the same (has the same POVM) as the original mea- 
surement. The information matrix for such a measurement is generated from 
the squared Hellinger affinity between the distributions of the measurement 
outcomes under two nearby states, just as the Helstrom information matrix 
is generated from the fidelity between the states. If the measurement is 
covariant then the Riemannian metric defined by the information matrix of 
the measurement outcome must be invariant under unitary transformations 
of the states. Hence: the information matrix of any covariant measurement 
is proportional to the Helstrom information matrix. 

Exhaustive covariant measurements certainly do exist. A particularly 
simple one is that, for each of the copies of the quantum system, we 
independently and uniformly choose a basis of and perform the simple 
measurement (given in an example in Section 2) corresponding to that basis. 

The first conclusion of all this is: any exhaustive covariant measurement 

—(N) 

has information matrix Ij^ equal to one half the Helstrom information 

matrix. All such measurements attain the Holevo bound trace|i?(/^'')~^ > 
d— 1. In particular, this holds for the i.i.d. measurement based on repeatedly 
choosing a uniformly distributed random basis of C^. 

The second conclusion is that an asymptotic lower bound on N times 
1— mean fidelity is d — 1. Now the exactly Bayes optimal measurement- 
and-estimation strategy is known to achieve this bound. The measurement 
involved is a mathematically elegant collective measurement on the N copies 
together, but hard to realise in the laboratory. Our results show that one can 
expect to asymptotically attain the bound by decent information processing 
(maximum likelihood? optimal Bayes with uniform prior and fidelity loss?) 
following an arbitrary exhaustive covariant measurement, of which the most 
simple to implement is the standard tomography measurement consisting 
of an independent random choice of measurement basis for each separate 
systeni; 

In Gill and Massar the same bound as ()28p was shown to hold 

for separable (and in particular, for adaptive sequential) measurements also 
in the mixed state case. Moreover in the case d = 2, any information 
m atrix satisfying the b ound is attainable already at A = 1. This is used 
in lBagan et al.l (|2006bl ) to obtain sharp asymptotic bounds to mean fidelity 



for separable measurements on mixed qubits. 
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